<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML
><HEAD
><TITLE
>Collation Support</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK
REV="MADE"
HREF="mailto:pgsql-docs@postgresql.org"><LINK
REL="HOME"
TITLE="PostgreSQL 9.1.2 Documentation"
HREF="index.html"><LINK
REL="UP"
TITLE="Localization"
HREF="charset.html"><LINK
REL="PREVIOUS"
TITLE="Locale Support"
HREF="locale.html"><LINK
REL="NEXT"
TITLE="Character Set Support"
HREF="multibyte.html"><LINK
REL="STYLESHEET"
TYPE="text/css"
HREF="stylesheet.css"><META
HTTP-EQUIV="Content-Type"
CONTENT="text/html; charset=ISO-8859-1"><META
NAME="creation"
CONTENT="2011-12-01T22:07:59"></HEAD
><BODY
CLASS="SECT1"
><DIV
CLASS="NAVHEADER"
><TABLE
SUMMARY="Header navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TH
COLSPAN="5"
ALIGN="center"
VALIGN="bottom"
><A
HREF="index.html"
>PostgreSQL 9.1.2 Documentation</A
></TH
></TR
><TR
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="top"
><A
TITLE="Locale Support"
HREF="locale.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="10%"
ALIGN="left"
VALIGN="top"
><A
HREF="charset.html"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="60%"
ALIGN="center"
VALIGN="bottom"
>Chapter 22. Localization</TD
><TD
WIDTH="20%"
ALIGN="right"
VALIGN="top"
><A
TITLE="Character Set Support"
HREF="multibyte.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
></TABLE
><HR
ALIGN="LEFT"
WIDTH="100%"></DIV
><DIV
CLASS="SECT1"
><H1
CLASS="SECT1"
><A
NAME="COLLATION"
>22.2. Collation Support</A
></H1
><P
>   The collation feature allows specifying the sort order and character
   classification behavior of data per-column, or even per-operation.
   This alleviates the restriction that the
   <TT
CLASS="SYMBOL"
>LC_COLLATE</TT
> and <TT
CLASS="SYMBOL"
>LC_CTYPE</TT
> settings
   of a database cannot be changed after its creation.
  </P
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="AEN31519"
>22.2.1. Concepts</A
></H2
><P
>    Conceptually, every expression of a collatable data type has a
    collation.  (The built-in collatable data types are
    <TT
CLASS="TYPE"
>text</TT
>, <TT
CLASS="TYPE"
>varchar</TT
>, and <TT
CLASS="TYPE"
>char</TT
>.
    User-defined base types can also be marked collatable, and of course
    a domain over a collatable data type is collatable.)  If the
    expression is a column reference, the collation of the expression is the
    defined collation of the column.  If the expression is a constant, the
    collation is the default collation of the data type of the
    constant.  The collation of a more complex expression is derived
    from the collations of its inputs, as described below.
   </P
><P
>    The collation of an expression can be the <SPAN
CLASS="QUOTE"
>"default"</SPAN
>
    collation, which means the locale settings defined for the
    database.  It is also possible for an expression's collation to be
    indeterminate.  In such cases, ordering operations and other
    operations that need to know the collation will fail.
   </P
><P
>    When the database system has to perform an ordering or a character
    classification, it uses the collation of the input expression.  This
    happens, for example, with <TT
CLASS="LITERAL"
>ORDER BY</TT
> clauses
    and function or operator calls such as <TT
CLASS="LITERAL"
>&lt;</TT
>.
    The collation to apply for an <TT
CLASS="LITERAL"
>ORDER BY</TT
> clause
    is simply the collation of the sort key.  The collation to apply for a
    function or operator call is derived from the arguments, as described
    below.  In addition to comparison operators, collations are taken into
    account by functions that convert between lower and upper case
    letters, such as <CODE
CLASS="FUNCTION"
>lower</CODE
>, <CODE
CLASS="FUNCTION"
>upper</CODE
>, and
    <CODE
CLASS="FUNCTION"
>initcap</CODE
>; by pattern matching operators; and by
    <CODE
CLASS="FUNCTION"
>to_char</CODE
> and related functions.
   </P
><P
>    For a function or operator call, the collation that is derived by
    examining the argument collations is used at run time for performing
    the specified operation.  If the result of the function or operator
    call is of a collatable data type, the collation is also used at parse
    time as the defined collation of the function or operator expression,
    in case there is a surrounding expression that requires knowledge of
    its collation.
   </P
><P
>    The <I
CLASS="FIRSTTERM"
>collation derivation</I
> of an expression can be
    implicit or explicit.  This distinction affects how collations are
    combined when multiple different collations appear in an
    expression.  An explicit collation derivation occurs when a
    <TT
CLASS="LITERAL"
>COLLATE</TT
> clause is used; all other collation
    derivations are implicit.  When multiple collations need to be
    combined, for example in a function call, the following rules are
    used:

    <P
></P
></P><OL
TYPE="1"
><LI
><P
>       If any input expression has an explicit collation derivation, then
       all explicitly derived collations among the input expressions must be
       the same, otherwise an error is raised.  If any explicitly
       derived collation is present, that is the result of the
       collation combination.
      </P
></LI
><LI
><P
>       Otherwise, all input expressions must have the same implicit
       collation derivation or the default collation.  If any non-default
       collation is present, that is the result of the collation combination.
       Otherwise, the result is the default collation.
      </P
></LI
><LI
><P
>       If there are conflicting non-default implicit collations among the
       input expressions, then the combination is deemed to have indeterminate
       collation.  This is not an error condition unless the particular
       function being invoked requires knowledge of the collation it should
       apply.  If it does, an error will be raised at run-time.
      </P
></LI
></OL
><P>

    For example, consider this table definition:
</P><PRE
CLASS="PROGRAMLISTING"
>CREATE TABLE test1 (
    a text COLLATE "de_DE",
    b text COLLATE "es_ES",
    ...
);</PRE
><P>

    Then in
</P><PRE
CLASS="PROGRAMLISTING"
>SELECT a &lt; 'foo' FROM test1;</PRE
><P>
    the <TT
CLASS="LITERAL"
>&lt;</TT
> comparison is performed according to
    <TT
CLASS="LITERAL"
>de_DE</TT
> rules, because the expression combines an
    implicitly derived collation with the default collation.  But in
</P><PRE
CLASS="PROGRAMLISTING"
>SELECT a &lt; ('foo' COLLATE "fr_FR") FROM test1;</PRE
><P>
    the comparison is performed using <TT
CLASS="LITERAL"
>fr_FR</TT
> rules,
    because the explicit collation derivation overrides the implicit one.
    Furthermore, given
</P><PRE
CLASS="PROGRAMLISTING"
>SELECT a &lt; b FROM test1;</PRE
><P>
    the parser cannot determine which collation to apply, since the
    <TT
CLASS="STRUCTFIELD"
>a</TT
> and <TT
CLASS="STRUCTFIELD"
>b</TT
> columns have conflicting
    implicit collations.  Since the <TT
CLASS="LITERAL"
>&lt;</TT
> operator
    does need to know which collation to use, this will result in an
    error.  The error can be resolved by attaching an explicit collation
    specifier to either input expression, thus:
</P><PRE
CLASS="PROGRAMLISTING"
>SELECT a &lt; b COLLATE "de_DE" FROM test1;</PRE
><P>
    or equivalently
</P><PRE
CLASS="PROGRAMLISTING"
>SELECT a COLLATE "de_DE" &lt; b FROM test1;</PRE
><P>
    On the other hand, the structurally similar case
</P><PRE
CLASS="PROGRAMLISTING"
>SELECT a || b FROM test1;</PRE
><P>
    does not result in an error, because the <TT
CLASS="LITERAL"
>||</TT
> operator
    does not care about collations: its result is the same regardless
    of the collation.
   </P
><P
>    The collation assigned to a function or operator's combined input
    expressions is also considered to apply to the function or operator's
    result, if the function or operator delivers a result of a collatable
    data type.  So, in
</P><PRE
CLASS="PROGRAMLISTING"
>SELECT * FROM test1 ORDER BY a || 'foo';</PRE
><P>
    the ordering will be done according to <TT
CLASS="LITERAL"
>de_DE</TT
> rules.
    But this query:
</P><PRE
CLASS="PROGRAMLISTING"
>SELECT * FROM test1 ORDER BY a || b;</PRE
><P>
    results in an error, because even though the <TT
CLASS="LITERAL"
>||</TT
> operator
    doesn't need to know a collation, the <TT
CLASS="LITERAL"
>ORDER BY</TT
> clause does.
    As before, the conflict can be resolved with an explicit collation
    specifier:
</P><PRE
CLASS="PROGRAMLISTING"
>SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";</PRE
><P>
   </P
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="AEN31567"
>22.2.2. Managing Collations</A
></H2
><P
>    A collation is an SQL schema object that maps an SQL name to
    operating system locales.  In particular, it maps to a combination
    of <TT
CLASS="SYMBOL"
>LC_COLLATE</TT
> and <TT
CLASS="SYMBOL"
>LC_CTYPE</TT
>.  (As
    the name would suggest, the main purpose of a collation is to set
    <TT
CLASS="SYMBOL"
>LC_COLLATE</TT
>, which controls the sort order.  But
    it is rarely necessary in practice to have an
    <TT
CLASS="SYMBOL"
>LC_CTYPE</TT
> setting that is different from
    <TT
CLASS="SYMBOL"
>LC_COLLATE</TT
>, so it is more convenient to collect
    these under one concept than to create another infrastructure for
    setting <TT
CLASS="SYMBOL"
>LC_CTYPE</TT
> per expression.)  Also, a collation
    is tied to a character set encoding (see <A
HREF="multibyte.html"
>Section 22.3</A
>).
    The same collation name may exist for different encodings.
   </P
><P
>    On all platforms, the collations named <TT
CLASS="LITERAL"
>default</TT
>,
    <TT
CLASS="LITERAL"
>C</TT
>, and <TT
CLASS="LITERAL"
>POSIX</TT
> are available.  Additional
    collations may be available depending on operating system support.
    The <TT
CLASS="LITERAL"
>default</TT
> collation selects the <TT
CLASS="SYMBOL"
>LC_COLLATE</TT
>
    and <TT
CLASS="SYMBOL"
>LC_CTYPE</TT
> values specified at database creation time.
    The <TT
CLASS="LITERAL"
>C</TT
> and <TT
CLASS="LITERAL"
>POSIX</TT
> collations both specify
    <SPAN
CLASS="QUOTE"
>"traditional C"</SPAN
> behavior, in which only the ASCII letters
    <SPAN
CLASS="QUOTE"
>"<TT
CLASS="LITERAL"
>A</TT
>"</SPAN
> through <SPAN
CLASS="QUOTE"
>"<TT
CLASS="LITERAL"
>Z</TT
>"</SPAN
>
    are treated as letters, and sorting is done strictly by character
    code byte values.
   </P
><P
>    If the operating system provides support for using multiple locales
    within a single program (<CODE
CLASS="FUNCTION"
>newlocale</CODE
> and related functions),
    then when a database cluster is initialized, <TT
CLASS="COMMAND"
>initdb</TT
>
    populates the system catalog <TT
CLASS="LITERAL"
>pg_collation</TT
> with
    collations based on all the locales it finds on the operating
    system at the time.  For example, the operating system might
    provide a locale named <TT
CLASS="LITERAL"
>de_DE.utf8</TT
>.
    <TT
CLASS="COMMAND"
>initdb</TT
> would then create a collation named
    <TT
CLASS="LITERAL"
>de_DE.utf8</TT
> for encoding <TT
CLASS="LITERAL"
>UTF8</TT
>
    that has both <TT
CLASS="SYMBOL"
>LC_COLLATE</TT
> and
    <TT
CLASS="SYMBOL"
>LC_CTYPE</TT
> set to <TT
CLASS="LITERAL"
>de_DE.utf8</TT
>.
    It will also create a collation with the <TT
CLASS="LITERAL"
>.utf8</TT
>
    tag stripped off the name.  So you could also use the collation
    under the name <TT
CLASS="LITERAL"
>de_DE</TT
>, which is less cumbersome
    to write and makes the name less encoding-dependent.  Note that,
    nevertheless, the initial set of collation names is
    platform-dependent.
   </P
><P
>    In case a collation is needed that has different values for
    <TT
CLASS="SYMBOL"
>LC_COLLATE</TT
> and <TT
CLASS="SYMBOL"
>LC_CTYPE</TT
>, a new
    collation may be created using
    the <A
HREF="sql-createcollation.html"
>CREATE COLLATION</A
> command.  That command
    can also be used to create a new collation from an existing
    collation, which can be useful to be able to use
    operating-system-independent collation names in applications.
   </P
><P
>    Within any particular database, only collations that use that
    database's encoding are of interest.  Other entries in
    <TT
CLASS="LITERAL"
>pg_collation</TT
> are ignored.  Thus, a stripped collation
    name such as <TT
CLASS="LITERAL"
>de_DE</TT
> can be considered unique
    within a given database even though it would not be unique globally.
    Use of the stripped collation names is recommendable, since it will
    make one less thing you need to change if you decide to change to
    another database encoding.  Note however that the <TT
CLASS="LITERAL"
>default</TT
>,
    <TT
CLASS="LITERAL"
>C</TT
>, and <TT
CLASS="LITERAL"
>POSIX</TT
> collations can be used
    regardless of the database encoding.
   </P
><P
>    <SPAN
CLASS="PRODUCTNAME"
>PostgreSQL</SPAN
> considers distinct collation
    objects to be incompatible even when they have identical properties.
    Thus for example,
</P><PRE
CLASS="PROGRAMLISTING"
>SELECT a COLLATE "C" &lt; b COLLATE "POSIX" FROM test1;</PRE
><P>
    will draw an error even though the <TT
CLASS="LITERAL"
>C</TT
> and <TT
CLASS="LITERAL"
>POSIX</TT
>
    collations have identical behaviors.  Mixing stripped and non-stripped
    collation names is therefore not recommended.
   </P
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="locale.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="index.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="multibyte.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Locale Support</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="charset.html"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>Character Set Support</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>